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This book is a marriage of three of my passions: algorithms, Python programming, and explaining 
things. To me, all three of these are about aesthetics — finding just the right way of doing something, 
looking until you uncover a hint of elegance, and then polishing that until it shines. Or at least until it is a 
bit shinier. Of course, when there’s a lot of material to cover, you may not get to polish things quite as 
much as you want. Luckily, though, most of the contents in this book is prepolished, because I'm writing 
about really beautiful algorithms and proofs, as well as one of the cutest programming languages out 
there. As for the third part, I’ve tried hard to find explanations that will make things seem as obvious as 
possible. Even so, I’m sure I have failed in many ways, and if you have suggestions for improving the 
book, I’d be happy to hear from you. Who knows, maybe some of your ideas could make it into a future 
edition? For now, though, I hope you have fun with what’s here and that you take any newfound insight 
and run with it. If you can, use it to make the world a more awesome place, in whatever way seems right. 




CHAPTER 1 



Irtroduction 



1. Writedown the problem. 

2. Think real hard. 

3. Write down the solutiori. 



“The Feynman Algori thm” 
as described by Murray Gell-Mann 

Consider the following problem. You are to visit ali the cities, towns, and villages of, say, Sweden and 
then return to your starting point. This might take a while (there are 24 978 locations to visit, after ali), so 
you want to minimize your route. You plan on visiting each location exactly once, following the shortest 
route possible. As a programmer, you certainly don’t want to plot the route by hand. Rather, you try to 
write some code that will plan your trip for you. For some reason, however, you can’t seem to get it right. 
A straightforward program works well for a smaller number of towns and cities but seems to run forever 
on the actual problem, and improving the program turns out to be surprisingly hard. How come? 

Actually, in 2004, a team of five researchers 1 found such a tour of Sweden, after a number of other 
research teams had tried and failed. The five-man team used cutting-edge Software with lots of elever 
optimizations and tricks of the trade, running on a cluster of 96 Xeon 2.6 GHz workstations. Their 
Software ran from March 2003 until May 2004, before it finally printed out the optimal solution. Taking 
various interruptions into account, the team estimated that the total CPU time spent was about 85 years\ 
Consider a similar problem: You want to get from Kashgar, in the westernmost regions of China, to 
Ningbo, on the east coast, following the shortest route possible. Now, China has 3 583715 km of 
roadways and 77 834 km of railways, with millions of intersections to consider and a virtually 
unfathomable number of possible routes to follow. It might seem that this problem is related to the 
previous one, yet this shortest patii problem is one solved routinely, with no appreciable delay, by GPS 
Software and Online map Services. If you give those two cities to your favorite map Service, you should 
get the shortest route in mere moments. What’s going on here? 

You will learn more about both of these problems later in the book; the first one is called the 
traveling salesman (or salesrep ) problem and is covered in Chapter 1 1 , while so-called shortest path 
problems are primarily dealt with in Chapter 9. 1 also hope you will gain a rather deep insight into why 
one problem seems like such a hard nut to crack while the other admits several well-known, efficient 
Solutions. More importantly, you will learn something about how to deal with algorithmic and 
computational problems in general, either solving them efficiently, using one of the several techniques 
and algorithms you encounter in this book, or showing that they are too hard and that approximate 
Solutions may be all you can hope for. This chapter briefly describes what the book is about — what you 



1 David Applegate, Robert Bixby, Vasek Chvatal, William Cook, and Keld Helsgaun 
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can expect and what is expected of you. It also outlines the specific contents of the various chapters to 
come in case you want to skip around. 



Whafs AII This, Then? 

This is a book about algorithmic problem solving for Python programmers. Just like books on, say, 
object-oriented patterns, the problems it deals with are of a general nature — as are the Solutions. Your 
task as an algorist will, in many cases, be more than simply to implement or execute an existing 
algorithm, as you would, for example, in solving an algebra problem. Instead, you are expected to come 
up with new algorithms — new general Solutions to hitherto unseen, general problems. In this book, you 
are going to learn principies for constructing such Solutions. 

This may not be your typical algorithm book, though. Most of the authoritative books on the subject 
(such as the Knuth’s classics or the industry- Standard textbook by Cormen et ai.) have a heavy formal 
and theoretical slant, even though some of them (such as the one by Kleinberg and Tardos) lean more in 
the direction of readability. Instead of trying to replace any of these excellent books, I’d like to 
supplement them. Building on my experience from teaching algorithms, I try to explain as clearly as 
possible how the algorithms work and what common principies underlie many of them. For a 
programmer, these explanations are probably enough. Chances are you’11 be able to understand why the 
algorithms are correct and how to adapt them to new problems you may come to face. If, however, you 
need the full depth of the more formalistic and encyclopedic textbooks, I hope the foundation you get in 
this book will help you understand the theorems and proofs you encounter there. 

There is another genre of algorithm books as well: the “(Data Structures and) Algorithms in blank” 
kind, where the blank is the authoris favorite programming language. There are quite a few of these 
(especially for blank = Java, it seems), but many of them focus on relatively basic data structures, to the 
detriment of the more meaty stuff. This is understandable if the book is designed to be used in a basic 
course on data structures, for example, but for a Python programmer, learning about singly and doubly 
linked lists may not be all that exciting (although you will hear a bit about those in the next chapter). And 
even though techniques such as hashing are highly important, you get hash tables for free in the form of 
Python dictionaries; there’s no need to implement them from scratch. Instead, I focus on more high- 
level algorithms. Many important concepts that are available as black-box implementations either in the 
Python language itself or in the Standard library (such as sorting, searching, and hashing) are explained 
more briefly, in special “black box” sidebars throughout the text. 

There is, of course, another factor that separates this book from those in the “Algorithms in 
Java/C/C++/C#” genre, namely, that the blank is Python. This places the book one step closer to the 
language-independent books (such as those by Knuth, 2 Cormen et al., and Kleinberg and Tardos, for 
example), which often use pseudocode, the kind of fake programming language that is designed to be 
readable rather than executable. One of Python’s distinguishing features is its readability; it is, more or 
less, executable pseudocode. Even if you’ ve never programmed in Python, you could probably decipher 
the meaning of most basic Python programs. The code in this book is designed to be readable exactly in 
this fashion — you need not be a Python expert to understand the examples (although you might need to 
look up some built-in functions and the like). And if you want to pretend the examples are actually 
pseudocode, feel free to do so. To sum up . . . 
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Knuth is also well-known for using assembly code for an abstract computer of his own design. 
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What the book is about: 

• Algorithm analysis, with a focus on asymptotic running time 

• Basic principies of algorithm design 

• How to represent well-known data structures in Python 

• How to implement well-known algorithms in Python 
What the book covers only briefly or partially: 

• Algorithms that are directly available in Python, either as part of the language or 
via the Standard library 

• Thorough and deep formalism (although the book has its share of proofs and 
proof-like explanations) 

What the book isn’t about : 3 

• Numerical or number-theoretical algorithms (except for some floating-point hints 
in Chapter 2) 

• Parallel algorithms and multicore programming 

As you can see, “implementing things in Python” is just part of the picture. The design principies and 
theoretical foundations are included in the hope that they’11 help you design your own algorithms and 
data structures. 



Why Are You Here? 

When working with algorithms, you’re trying to solve problems efficiently. Your programs should be fast; 
the wait for a solution should be short. But what, exactly, do we mean by efficient, fast, and short? And 
why would one care about these things in a language such as Python, which isn’t exactly lightning fast to 
begin with? Why not rather switch to, say, C or Java? 

First, Python is a lovely language, and you may not want to switch. Or maybe you have no choice in 
the matter. But second, and perhaps most importantly, algorists don’t primarily worry about constant 
differences in performance. 4 If one program takes twice, or even ten times, as long as another to finish, it 
may stili b e fast enough, and the slower program (or language) may have other desirable properties, such 
as being more readable. Tweaking and optimizing can be costly in many ways and is not a task to be 
taken on lightly. What does matter, though, no matter the language, is how your program scales. If you 
double the size of your input, what happens? Will your program run for twice as long? Four times? More? 
Will the running time double even if you add just one measly bit to the input? These are the kind of 
differences that will easily trump language or hardware choice, if your problems get big enough. And in 
some cases “big enough” needn’t be all that big. Your main weapon in whittling down the growth of your 
running time is — you guessed it — a solid understanding of algorithm design. 

Let’s try a litde experiment. Fire up an interactive Python interpreter, and enter the following: 



3 Of course, the book is also not about a lot of other things 

4 1'm talking about constant multiplicative factors here, such as doubling or halving the execution time. 
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»> count = 1 0**5 
»> nums = [] 

»> for i in range(count): 
nums.append(i) 

»> nums.reverse() 

Not the most useful piece of code, perhaps. It simply appends a bunch of numbers to an (initially) 
empty list and then reverses that list. In a more realistic situation, the numbers might come from some 
outside source (they could be incoming connections to a server, for example), and you want to add them 
to your list in reverse order, perhaps to prioritize the most recent ones. Nowyou get an idea: instead of 
reversing the list at the end, couldn’t you just insert the numbers at the beginning, as they appear? 

Here’s an attempt to streamline the code (continuing in the same interpreter window) : 

»> nums = [] 

»> for i in range(count): 
nums.insert(0, i) 

Unless you’ve encountered this situation before, the new code might look promising, but try to run 
it. Chances are you’ll notice a distinet slowdown. On my computer, the second piece of code takes over 
100 times as long as the first to finish. Not only is it slower, but it also scales worse with the problem size. 
Try, for example, to increase count from 1 0**5 to 1 0**6. As expected, this increases the running time 
for the first piece of code by a factor of about ten . . . but the second version is slowed by roughly two 
orders of magnitude, making it more than a thousand times slower than the first! As you can probably 
guess, the discrepancy between the two versions only increases as the problem gets bigger, making the 
choice between them ever more crucial. 



Note This is an example of linear vs. quadratic growth, a topic dealt with in detail in Chapter 3. The specific 
issue underlying the quadratic growth is explained in the discussion of vectors (or dynamic arrays) in the black box 
sidebar on list in Chapter 2. 



Some Prerequisites 

This book is intended for two groups of people: Python programmers, who want to beef up their 
algorithmies, and students taking algorithm courses, who want a supplement to their plain-vanilla 
algorithms textbook. Even if you belong to the latter group, I’m assuming you have a familiarity with 
programming in general and with Python in particular. If you don’t, perhaps my book Beginning Python 
(which covers Python versions up to 3.0) can help? The Python web site also has a lot of useful material, 
and Python is a really easy language to learn. There is some math in the pages ahead, but you don’t have 
to be a math prodigy to follow the text. Well be dealing with some simple sums and nifty concepts such 
as polynomials, exponentials, and logarithms, but Tll explain it ali as we go along. 

Before heading off into the mysterious and wondrous lands of computer Science, you should have 
your equipment ready. As a Python programmer, I assume you have your own favorite text/code editor 
or integrated development environment — Tm not going to interfere with that. When it comes to Python 
versions, the book is written to be reasonably version-independent, meaning that most of the code 
should work with both the Python 2 and 3 series. Where backward-incompatible Python 3 features are 
used, there will be explanations on how to implement the algorithm in Python 2 as well. (And if, for some 
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reason, you’re stili stuck with, say, the Python 1.5 series, most of the code should stili work, with a tweak 
here and there.) 



GETTING WHAT YOU NEED 



In some operating systems, such as Mac OS X and several flavors of Linux, Python should already be 
installed. If it is not, most Linux distributions will let you install the Software you need through some form 
of package manager. If you want or need to install Python manually, you can find all you need on the 
Python web site, http://python.org. 



What’s in This Book 

The book is structured as follows: 

Chapter 1: Introduction. You 've already gotten through most of this. It gives an overview of the book. 

Chapter 2: The Basies. This covers the basic concepts and terminology, as well as some fundamental 
math. Among other things, you learn how to be sloppier with your formulas than ever before, and stili 
get the right results, with asymptotic notation. 

Chapter 3: Counting 101. More math — but it’s really fun math, I promise! There’s some basic 
combinatorics for analyzing the running time of algorithms, as well as a gentle introduction to recursion 
and recurrence relations. 

Chapter 4: Induction and Recursion ... and Reduction. The three terms in the title are crucial, and they 
are closely related. Here we work with induction and recursion, which are virtually mirror images of each 
other, both for designing new algorithms and for proving correctness. We also have a somewhat briefer 
look at the idea of reduction, which runs as a common thread through almost all algorithmic work. 

Chapter 5: Traversal: A Skeleton Key to Algorithmics. Traversal can be understood using the ideas of 
induction and recursion, but it is in many ways a more concrete and specific technique. Several of the 
algorithms in this book are simply augmented traversals, so mastering traversal will give you a real 
jump start. 

Chapter 6: Divide, Combine, and Conquer. When problems can be decomposed into independent 
subproblems, you can recursively solve these subproblems and usually get efficient, correct algorithms 
as a resuit. This principle has several applications, not all of which are entirely obvious, and it is a mental 
tool well worth acquiring. 

Chapter 7: Greed is Good? Prove It! Greedy algorithms are usually easy to construet. One can even 
formulate a general scheme that most, if not all, greedy algorithms follow, yielding a plug-and-play 
solution. Not only are they easy to construet, but they are usually very efficient. The problem is, it can be 
hard to show that they are correct (and often they aren’t). This chapter deals with some well-known 
examples and some more general methods for constructing correctness proofs. 

Chapter 8: Tangled Dependencies and Memoization. This chapter is about the design method (or, 
historically, the problem) called, somewhat confusingly, dynamic programming. It is an advanced 
technique that can be hard to master but that also yields some of the most enduring insights and elegant 
Solutions in the field. 
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Chapter 9: From A to B with Edsger and Friends. Rather than the design methods of the previous three 
chapters, we now focus on a specific problem, with a host of applications: finding shortest paths in 
networks, or graphs. There are many variations of the problem, with corresponding (beautiful) 
algorithms. 

Chapter 10: Matchings, Cuts, and Flows. How do you match, say, students with colleges so you 
maximize total satisfaction? In an Online community, how do you know whom to trust? And how do you 
find the total capacity of a road network? These, and several other problems, can be solved with a small 
class of closely related algorithms and are ali variations of the maximum flow problem, which is covered 
in this chapter. 

Chapter 11: Hard Problems and (Fimited) Sloppiness. As alluded to in the beginning of the 
introduction, there are problems we don’t know how to solve efficiently and that we have reasons to 
think won’t be solved for a long time — maybe never. In this chapter, you learn how to apply the trusty 
tool of reduction in a new way: not to solve problems but to show that they are hard. Also, we have a 
look at how a bit of (strictly limited) sloppiness in our optimality criteria can make problems a lot 
easier to solve. 

Appendix A: Pedal to the Metal: Accelerating Python. The main focus of this book is asymptotic 
efficiency — making your programs scale well with problem size. However, in some cases, that may not 
be enough. This appendix gives you some pointers to tools that can make your Python programs go 
faster. Sometimes a lot (as in hundreds of times) faster. 

Appendix B: Fist of Problems and Algorithms. This appendix gives you an overview of the algorithmic 
problems and algorithms discussed in the book, with some extra information to help you select the right 
algorithm for the problem at hand. 

Appendix C: Graph Terminology and Notation. Graphs are a really useful structure, both in describing 
real-world systems and in demonstrating how various algorithms work. This chapter gives you a tour of 
the basic concepts and lingo, in case you haven’t dealt with graphs before. 

Appendix D: Hints for Exercises. Just what the title says. 



Summary 

Programming isn’t just about Software architecture and object-oriented design; it’s also about 
solving algorithmic problems, some of which are really hard. For the more run-of-the-mill problems 
(such as finding the shortest path from A to B) , the algorithm you use or design can have a huge 
impact on the time your code takes to finish, and for the hard problems (such as finding the shortest 
route through A-Z), there may not even be an efficient algorithm, meaning that you need to accept 
approximate Solutions. 

This book will teach you several well-known algorithms, along with general principies that will help 
you create your own. Ideally, this will let you solve some of the more challenging problems out there, as 
well as create programs that scale gracefully with problem size. In the next chapter, we get started with 
the basic concepts of algorithmics, dealing with terms that will be used throughout the entire book. 



If YoiTre Curious ... 

This is a section you'11 see in ali the chapters to come. It’s intended to give you some hints about details, 
wrinkles, or advanced topics that have been omitted or glossed over in the main text and point you in 
the direction of further information. For now, IT1 just refer you to the “References” section, later in this 
chapter, which gives you details about the algorithm books mentioned in the main text. 
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Exercises 

As with the previous section, this is one you’ll encounter again and again. Hints for solving the exercises 
can be found at the back of the book. The exercises often tie in with the main text, covering points that 
aren’t explicitly discussed there but that may be of interest or that deserve some contemplation. If you 
want to really sharpen your algorithm design skills, you might also want to check out some of the myriad 
of sources of programming puzzles out there. There are, for example, lots of programming contests (a 
web search should turn up plenty) , many of which post problems that you can play with. Many big 
Software companies also have qualification tests based on problems such as these and publish some of 
them online. 

Because the introduction doesn’t cover that much ground, I'll just give you a couple of exercises 
here — a taste of what’s to come: 

1-1. Consider the following statement: “As machines get faster and memory cheaper, algorithms become 
less important.’’ What do you think; is this true or false? Why? 

1-2. Find a way of checking whether two strings are anagrams of each other (such as "debit card" and 
"bad credit"). How well do you think your solution scales? Can you think of a naive solution that will 
scale very poorly? 
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Tracey: I didn’t knowyou were out there. 

Zoe: Sort ofthe point. Stealth — you may have heard ofit. 

Tracey: I don 't think they covered that in basic. 

From “The Message,” episode 14 of Firefly 

Before moving on to the mathematical techniques, algorithmic design principies, and classical 
algorithms that make up the bulk of this book, we need to go through some basic principies and 
techniques. When you start reading the following chapters, you should be ciear on the meaning of 
phrases such as “directed, weighted graph without negative cycles” and "a running time of (-)(« lg n)." 
You should also have an idea of how to implement some fundamental structures in Python. 

Luckily, these basic ideas aren’t at ali hard to grasp. The main two topics of the chapter are 
asymptotic notation, which lets you focus on the essence of running times, and ways of representing 
trees and graphs in Python. There is also practical advice on timing your programs and avoiding some 
basic traps. First, though, let’s take a look at the abstract machines we algorists tend to use when 
describing the behavior of our algorithms. 



Some Core Ideas in Computing 

In the mid-1930s the English mathematician Alan Turing published a paper called “On computable 
numbers, with an application to the Entscheidungsproblem” 1 and, in many ways, laid the groundwork 
for modern computer Science. His abstract Turing machine has become a Central concept in the theory 
of computation, in great part because it is intuitively easy to grasp. A Turing machine is a simple 
(abstract) device that can read from, write to, and move along an infinitely long strip of paper. The actual 
behavior of the machines varies. Each is a so-called finile state machine: it has a finite set of States (some 
of which indicate that it has finished), and every Symbol it reads potentially triggers reading and/or 
writing and switching to a different state. You can think of this machinery as a set of rules. (“If I am in 
state 4 and see an X, I move one step to the left, write a Y, and switch to state 9.”) Although these 
machines may seem simple, they can, surprisingly enough, be used to implement any form of 
computation anyone has been able to dream up so far, and most computer scientists believe they 
encapsulate the very essence of what we think of as computing. 



1 The Entscheidungsproblem is a problem posed by David Hilbert, which basically asks whether an algorithm exists 
that can decide, in general, whether a mathematical statement is true or false. Turing (and Alonzo Church before him) 
showed that such an algorithm cannot exist. 
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An algorithm is a procedure, consisting of a finite set of steps (possibly including loops and 
conditionals) that solves a given problem in finite time. A Turing machine is a formal description of 
exactly what problem an algorithm solves, 2 and the formalism is often used when discussing which 
problems can be solved (either at all or in reasonable time, as discussed later in this chapter and in 
Chapter 11). For more fine-grained analysis of algorithmic efficiency, however, Turing machines are not 
usually the first choice. Instead of scrolling along a paper tape, we use a big chunk of memory that can 
be accessed directly. The resulting machine is commonly known as the random-access machine. 

While the formalities of the random-access machine can get a bit complicated, we just need to know 
something about the limits of its capabilities so we don’t cheat in our algorithm analyses. The machine is 
an abstract, simplified version of a Standard, single-processor computer, with the following properties: 

• We don’t have access to any form of concurrent execution; the machine simply 
executes one instruction after the other. 

• Standard, basic operations (such as arithmetic, comparisons, and memory access) 
all take constant (although possibly different) amounts of time. There are no more 
complicated basic operations (such as sorting). 

• One computer word (the size of a value that we can work with in constant time) is 
not unlimited but is big enough to address all the memory locations used to 
represent our problem, plus an extra percentage for our variables. 

In some cases, we may need to be more specific, but this machine sketch should do for the moment. 

We now have a bit of an intuition for what algorithms are, as well as the (abstract) hardware we’11 
be running them on. The last piece of the puzzle is the notion of a problem. For our purposes, a 
problem is a relation between input and output. This is, in fact, much more precise than it might sound: 
a relation (in the mathematical sense) is a set of pairs — in our case, which outputs are acceptable for 
which inputs — and by specifying this relation, we’ve got our problem nailed down. For example, the 
problem of sorting may be specified as a relation between two sets, A and B, each consisting of 
sequences. 3 Without describing how to perform the sorting (that would be the algorithm), we can 
specify which output sequences (elements of B) that would be acceptable, given an input sequence (an 
element of A). We would require that the resuit sequence consisted of the same elements as the input 
sequence and that the elements of the resuit sequence were in increasing order (each bigger than or 
equal to the previous). The elements of A here (that is, the inputs) are called problem instances ; the 
relation itself is the actual problem. 

To get our machine to work with a problem, we need to encode the input as zeros and ones. We 
won’t worry too much about the details here, but the idea is important, because the notion of running 
time complexity (as described in the next section) is based on knowing how big a problem instance is, 
and that size is simply the amount of memory needed to encode it. (As you’ll see, the exact nature of this 
encoding usually won’t matter.) 



Asymptotic Notation 

Remember the append versus insert example in Chapter 1? Somehow, adding items to the end of a list 
scaled better with the list size than inserting them at the front (see the nearby black box sidebar on list 
for an explanation). These built-in operations are both written in C, but assume for a minute that you 
reimplement list . append in pure Python; let’s say (arbitrarily) that the new version is 50 times slower 



2 There are also Turing machines that don’t solve any problems — machines that simply never stop. These stili 
represent what we might call programs, but we usually don’t call them algorithms. 

3 Because input and output are of the same type, we could actually just specify a relation between A and A. 
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than the original. Let’s also say that you run your slow, pure-Python append-based version on a really 
slow machine, while the fast, optimized, insert-based version is run on a computer that is 1 000 times 
faster. Now the speed advantage of the insert version is a factor of 50000. You compare the two 
implementations by inserting 100 000 numbers. What do you think happens? 

Intuitively, it might seem obvious that the speedy solution should win, but its “speediness” is just a 
constant factor, and its running time grows faster than the “slower” one. For the example at hand, the 
Python-coded version running on the slower machine will, actually, finish in half the time of the other 
one. Let’s increase the problem size a bit, to 10 million numbers, for example. Now the Python version 
on the slow machine will be 2000 times faster than the C version on the fast machine. That’s like the 
difference between running for about a minute and running almost a day and a half! 

This distinction between constant factors (related to such things as general programming language 
performance and hardware speed, for example) and the growth of the running time, as problem sizes 
increase, is of vital importance in the study of algorithms. Our focus is on the big picture — the 
implementation-independent properties of a given way of solving a problem. We want to get rid of 
distracting details and get down to the core differences, but in order to do so, we need some formalism. 



BLACK BOX: LIST 



Python lists aren’t really lists in the traditional (computer Science) sense of the word, and that explains the 
puzzle of why append is so much more efficient than insert. A classical list — a so-called linked list— is 
implemented as a series of nodes , each (except for the last) keeping a reference to the next. A simple 
implementation might look something like this: 

class Node: 

def init (self, value, next=None): 

self.value = value 
self. next = next 

You construet a list by specifying ali the nodes: 

>>> L = Node("a", Node("b", Node("c", Node("d")))) 

>>> 1. next. next. value 
'c' 

This is a so-called singly linked list; each node in a doubly linked list would also keep a reference to the 
previous node. 

The underlying implementation of Python’s list type is a bit different. Instead of several separate nodes 
referencing each other, a list is basically a single, contiguous slab of memory — what is usually known as 
an array. This leads to some important differences from linked lists. For example, while iterating over the 
contents of the list is equally efficient for both kinds (except for some overhead in the linked list), directly 
accessing an element at a given index is much more efficient in an array. This is because the position of 
the element can be calculated, and the right memory location can be accessed directly. In a linked list, 
however, one would have to traverse the list from the beginning. 
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The difference we’ve been bumping up against, though, has to do with insertion. In a linked list, once you 
know where you want to insert something, insertion is cheap; it takes (roughly) the same amount of time, 
no matter how many elements the list contains. Not so with arrays: an insertion would have to move ali 
elements that are to the right of the insertion point, possibly even moving all the elements to a larger array, 
if needed. A specific solution for appending is to use what’s often called a dynamic array, or vector. 4 The 
idea is to allocate an array that is too big and then to reallocate it (in linear time) whenever it overflows. It 
might seem that this makes the append just as bad as the insert. In both cases, we risk having to move a 
large number of elements. The main difference is that it happens less often with the append. In fact, if we 
can ensure that we always move to an array that is bigger than the last by a fixed percentage (say 20 
percent or even 100 percent), the average cost (or, more correctly, the amortized cost, averaged over 
many appends) is negligible (constant). 



It’s Greek to Me! 

Asymptotic notation has been in use (with some variations) since the late 19th century and is an 
essential tool in analyzing algorithms and data structures. The core idea is to represent the resource 
we’re analyzing (usually time but sometimes also memory) as a function, with the input size as its 
parameter. For example, we could have a program with a running time of T[n) = 2 An + 7. 

An important question arises immediately: what are the units here? It might seem trivial whether we 
measure the running time in seconds or milliseconds or whether we use bits or megabytes to represent 
problem size. The somewhat surprising answer, though, is that not only is it trivial, but it actually will not 
affect our results at ali. We could measure time in Jovian years and problem size in kg (presumably the 
mass of the storage medium used), and it will not matter. This is because our original intention of 
ignoring implementation details carries over to these factors as well: the asymptotic notation ignores 
them ali! (We do normally assume that the problem size is a positive integer, though.) 

What we often end up doing is letting the running time be the number of times a certain basic 
operation is performed, while problem size is either the number of items handled (such as the number 
of integers to be sorted, for example) or, in some cases, the number of bits needed to encode the 
problem instance in some reasonable encoding. 





ossert "Ut 90109 b° oktty A 




Forgetting. Ofcourse, the assert doesn’t work. (http://xkcd.com/379) 



4 For an “out-of-the-box” solution for inserting objects at the beginning of a sequence, see the black box sidebar on 
d eque in Chapter 5. 
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Note Exactly how you encode your problems and Solutions as bit patterns usually has little effect on the 
asymptotic running time, as long as you are reasonable. For example, avoid representing your numbers in the 
unary number system (1 =1 , 2=1 1 , 3=1 1 1 . . .). 



The asymptotic notation consists of a bunch of operators, written as Greek letters. The most 
important ones (and the only ones we’11 be using) are O (originally an omicron but now usually called 
“Big Oh”), O (omega), and 0 (theta). The definition for the O operator can be used as a foundation for 
the other two. The expression 0(g), for some function g(n), represents a set of functions, and a function 
f(n ) is in this set if it satisfies the following condition: there exists a natural number n 0 and a positive 
constant c such that 

f(n) < cg(n ) 

for ali n > n 0 . In other words, if we’re allowed to tweak the constant c (for example, by running the 
algorithms on machines of different speeds), the function g will eventually (that is, at n 0 ) grow bigger 
than /. See Figure 2-1 for an example. 

This is a fairly straightforward and understandable definition, although it may seem a bit foreign at 
first. Basically, O(g) is the set of functions that do not grow f aster than g. For example, the function n 2 is 
in the set 0(n 2 ), or, in set notation, n 2 e 0(n 2 ). We often simply say that n 2 is 0(/r). 

The fact that n 2 does not grow faster than itself is not particularly interesting. More useful, perhaps, 
is the fact that neither 2 An 2 + 7 nor the linear function n does. That is, we have both 

2.4/r + 7 6 0(n 2 ) 



and 



n e 0(n 2 ). 




Figure 2-1. For values ofn greater than n 0 , T(n) is less than cn 2 , so T(n) is 0(n 2 ). 



13 




CHAPTER 2 THE BASICS 



The first example shows us that we are now able to represent a function without all its bells and 
whistles; we can drop the 2.4 and the 7 and simply express the function as 0(« 2 ), which gives us just the 
information we need. The second shows us that O can be used to express loose limits as well: any 
function that is better (that is, doesn’t growfaster) than g can be found in O(g). 

How does this relate to our original example? Well, the thing is, even though we can’t be sure of the 
details (after all, they depend on both the Python version and the hardware you’re using), we can 
describe the operations asymptotically: the running time of appending n numbers to a Python list is 
O(n), while inserting n numbers at its beginning is 0(n 2 ). 

The other two, Q and 0, are just variations of O. Q is its complete opposite: a function /is in fi(g) if it 
satisfies the following condition: there exists a natural number n 0 and a positive constant c such that 

f(n) > cg(n) 

for all n > n 0 . So, where O forms a so-called asymptotic upper bound, Q forms an asymptotic lower 
bound. 



Note Our first two asymptotic operators, 0 and Q, are each others’ inverses: if fis 0(g), then g is Q(f). Exercise 
1-3 asks you to Show this. 



The sets formed by 0 are simply intersections of the other two, that is, 0(g) = O(g) fl O (g) . In other 
words, a function/is in 0(g) if it satisfies the following condition: there exists a natural number n 0 and 
two positive constants c, and c 2 such that 

Cigiri) <f(n) < c 2 g(n) 

for all n > n 0 . This means that /and g have the same asymptotic growth. For example, 3/r + 2is 0 (n 2 ) , but 
we could just as well write that n 2 is 0(3« 2 + 2). By supplying an upper bound and a lower bound at the 
same time, the 0 operator is the most informative of the three, and I will use it when possible. 



Rules of the Road 

While the definitions of the asymptotic operators can be a bit tough to use directly, they actually lead to 
some of the simplest math ever. You can drop all multiplicative and additive constants, as well as all 
other “small parts” of your function, which simplifies things a lot. 

As a first step in juggling these asymptotic expressions, let’s take a look at some typical asymptotic 
classes, or orders. Table 2-1 lists some of these, along with their names and some typical algorithms with 
these asymptotic running times, also sometimes called running-time complexities. (If your math is a 
little rusty, you could take a look at the sidebar named “A Quick Math Refresher” later in the chapter.) An 
important feature of this table is that the complexities have been ordered so that each row dominates the 
previous one: if/is found lower in the table than g, then/is O(g). 5 



5 For the “Cubic" and "Polynomial" row, this holds only when fc> 3. 
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Note Actually, the relationship is even stricter: fis o(g), where the “ Little Oh” is a stricter version if “Big Oh.” 
Intuitively, instead ot “doesn’t grow faster than,” it means “grows slower than.” Formally, it States that f(n)/g(n) 
converges to zero as n grows to infinity. You don’t really need to worry about this, though. 



Any polynomial (that is, with any power k > 0, even a fractional one) dominates any logarithm (that 
is, with any base), and any exponential (with any base k > 1) dominates any polynomial (see Exercises 
2-5 and 2-6). Actually, ali logarithms are asymptotically equivalent — they differ only by constant factors 
(see Exercise 2-4). Polynomials and exponentials, however, have different asymptotic growth depending 
on their exponents or bases, respectively. So, n 5 grows faster than n 4 , and 5" grows faster than 4". 

The table primarily uses 0 notation, but the terms polynomial and exponential are a bit special, 
because of the role they play in separating tmctable ("solvable”) problems from intractable 
(“unsolvable”) ones, as discussed in Chapter 11. Basically, an algorithm with a polynomial running time 
is considered feasible, while an exponential one is generally useless. Although this isn't entirely true in 
practice, (0(n 100 ) is no more practically useful than 0(2")), it is, in many cases, a useful distinction. 6 
Because of this division, any running time in 0(n k ), for any k > 0, is called polynomial, even though the 
limit may not be tight. For example, even though binary search (explained in the black box sidebar on 
bisect in Chapter 6) has a running time of 0(lg n ), it is stili said to be a polynomial-time (or just 
polynomial) algorithm. Conversely, any running time in fl(fc") — even one that is, say, 0(n!) — is said to 
be exponential. 



Table 2-1. Common Examples of Asymptotic Running Times 



Complexity 


Name 


Examples, Comments 


0(1) 


Constant 


Hash table lookup and modification (see black box sidebar on dict). 


0(lg n) 


Logarithmic 


Binary search (see Chapter 6). Logarithm base unimportant. 


©(«) 


Linear 


Iterating over a list. 


®{n lg n) 


Loglinear 


Optimal sorting of arbitrary values (see Chapter 6). Same as 0(lg n!). 


®{n 2 ) 


Quadratic 


Comparing n objects to each other (see Chapter 3). 


0(n 3 ) 


Cubic 


Floyd and WarshalTs algorithms (see Chapters 8 and 9). 


0{n k ) 


Polynomial 


k nested for loops over n (if k is pos. integer). For any constant k > 0. 


n(k n ) 


Exponential 


Producing every subset of n items ( k = 2; see Chapter 3). Any k > 1. 


©(«!) 


Factorial 


Producing every ordering of n values. 



6 Interestingly, once a problem is shown to have a polynomial solution, an efficient polynomial solution can quite 
often be found as well. 
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Now that we have an overview of some important orders of growth, we can formulate two 
simple rules: 

• In a sum, only the dominating summand matters. 

For example, 0(« 2 + n 3 + 42) = 0(« 3 ). 

• In a product, constant factors don't matter. 

For example, 0(4.2« lg «) = 0(« lg n). 

In general, we try to keep the asymptotic expressions as simple as possible, eliminating as many 
unnecessary parts as we can. For O and Q, there is a third principle we usually follow: 

• Keep your upper or lower limits tight. 

In other words, we try to make the upper limits low and the lower limits high. 

For example, although n 2 might technically be 0(« 3 ), we usually prefer the 
tighter limit, 0(n 2 ). In most cases, though, the best thing is to simply use 0. 

A practice that can make asymptotic expressions even more useful is that of using them instead of 
actual values, in arithmetic expressions. Although this is technically incorrect (each asymptotic 
expression yields a set of functions, after ali), it is quite common. For example, 0(« 2 ) + 0(n 3 ) simply 
means /+ g, for some (unknown) functions /and g, where /is 0(« 2 ) and g is 0(n 3 ). Even though we 
cannot find the exact sum/+ g, because we don’t know the exact functions, we can find the asymptotic 
expression to cover it, as illustrated by the following two “bonus rules:” 

• ®{j) +0(g) =0(/+g) 

• ®(f) ■ 0(g) = ®(f- g) 

Exercise 2-8 asks you to show that these are correct. 

Taking the Asymp toties for a Spin 

Let’s take a look at some very simple programs and see whether we can determine their asymptotic 
running times. To begin with, let’s consider programs where the (asymptotic) running time varies only 
with the problem size, not the specifics of the instance in question. (The next section deals with what 
happens if the actual contents of the instances matter to the running time.) This means, for example, 
that if statements are rather irrelevant for now. What’s important is loops, in addition to straightforward 
code blocks. Function calls don’t really complicate things; they just calculate the complexity for the call 
and insert it at the right place. 



Note There is one situation where function calls can trip us up: when the function is recursive. This case is dealt 
with in Chapters 3 and 4. 



The loop-free case is simple: we are exeeuting one statement before another, so their complexities 
are added. Let’s say, for example, that we know that for a list of size «, a call to append is 0(1), while a 
call to insert at position 0 is 0(«). Consider the following little two-line program fragment, where nums is 
a list of size n: 

nums.append(l) 

nums.insert(o,2) 
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We know that the line first takes constant time. At the time we get to the second line, the list size has 
changed and is now n+ 1. This means that the complexity of the second line is 0(n + 1), which is the 
same as 0(«). Thus, the total running time is the sum of the two complexities, 0(1) + (-)(«) = 0(n). 

Now, let’s consider some simple loops. Here’s a plain for loop over a sequence with n elements 
(numbers, say): 7 

s = 0 

for x in seq: 
s += x 

This is a straightforward implementation of what the sum function does: it iterates over seq and adds 
the elements to the starting value in s. This performs a single constant-time operation (s += x) for each of 
the n elements of seq, which means that its running time is linear, or 0(n). Note that the constant-time 
initialization (s = 0) is dominated by the loop here. 

The same logic applies to the “camouflaged” loops we find in list (or set or dict) comprehensions 
and generator expressions, for example. The following list comprehension also has a linear running-time 
complexity: 

squares = [x**2 for x in seq] 

Several built-in functions and methods also have “hidden” loops in them. This generally applies to 
any function or method that deals with every element of a Container, such as sum or map, for example. 

Things get a little bit (but not a lot) trickier when we start nesting loops. Let’s say we want to sum up 
all possible products of the elements in seq, for example: 

s = 0 

for x in seq: 
for y in seq: 
s += x*y 

One thing worth noting about this implementation is that each product will be added twice (if 42 
and 333 are both in seq, for example, we’ll add both 42*333 and 333*42). That doesn't really affect the 
running time (it’s just a constant factor). 

What’s the running time now? The basic rule is easy: the complexities of code blocks executed one 
after the other are just added. The complexities of nested loops are multiplied. The reasoning is simple: 
for each round of the outer loop, the inner one is executed in full. In this case, that means “linear times 
linear,” which is quadratic. In other words, the running time is 0 ( n • n ) = 0(/r). Actually, this 
multiplication rule means that for further levels of nesting, we will just increment the power (that is, the 
exponent). Three nested linear loops give us 0(n 3 ), four give us 0(n‘), and so forth. 

The sequential and nested cases can be mixed, of course. Consider the following slight extension: 

s = 0 

for x in seq: 
for y in seq: 
s += x*y 
for z in seq: 
for w in seq: 
s += x-w 



7 If the x elements are ints, the running time of each += is constant. However, Python also support big integers, or 
longs, which automatically appear when your integers get big enough. This means that you can break the constant- 
time assumption by using really huge numbers. If you’re using floats, that won’t happen (but see the discussion of 
float problems near the end of the chapter). 
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It may not be entirely ciear what we’re computing here (I certainly have no idea), but we should stili 
be able to find the running time, using our rules. The z-loop is run for a linear number of iterations, and 
it contains a linear loop, so the total complexity there is quadratic, or 0(n 2 ). The y-loop is clearly 0(«). 
This means that the code block inside the x-loop is 0(n + n 2 ). This entire block is executed for each 
round of the x-loop, which is run n times. We use our multiplication rule and get ®(n{n + n 2 )) = © ( n 2 + n 3 ) 
= 0(n 3 ), that is, cubic. We could arrive at this conclusion even more easily by noting that the y-loop is 
dominated by the z-loop and can be ignored, giving the inner block a quadratic running time. 

“Quadratic times linear” gives us cubic. 

The loops need not all be repeated 0(n) times, of course. Let’s say we have two sequences, seql and 
seq2, where seql contains n elements and seq2 contains m elements. The following code will then have a 
running time of 0 (n m ) : 

s = 0 

for x in seql: 
for y in seq2: 
s += x*y 

In fact, the inner loop need not even be executed the same number of times for each iteration of the 
outer loop. This is where things can get a bit fiddly. Instead of just multiplying two iteration counts (such 
as n and m in the previous example), we now have to sum the iteration counts of the inner loop. What 
that means should be ciear in the following example: 

seql = [[0, 1], [2] , [ 3 , 4, 5]] 
s = 0 

for seq2 in seql: 
for x in seq2: 
s += x 

The statement s += x is now performed 2 + 1 + 3 = 6 times. The length of seq2 gives us the running 
time of the inner loop, but because it varies, we cannot simply multiply it by the iteration count of the 
outer loop. A more realistic example is the following, which revisits our original example — multiplying 
every combination of elements from a sequence: 

s = 0 

n = len(seq) 

for i in range(n-l) : 

for j in range(i+l, n): 
s += seq [i] * seq[j] 

To avoid multiplying objects with themselves or adding the same product twice, the outer loop 
now avoids the last item, and the inner loop iterates over the items only after the one currently 
considered by the outer one. This is actually a lot less confusing than it might seem, but finding the 
complexity here requires a little bit more care. This is one of the important cases of counting that is 
covered in the next chapter. 8 



8 Spoiler: The complexity of this example is stili ©(w 2 ). 
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Three Important Cases 

Until now, we have assumed that the running time is completely deterministic and dependent only on 
input size, not on the actual contents of the input. That is not particularly realistic, however. For 
example, if you were to construet a sorting algorithm, you might start like this: 

def sort_w_check(seq) : 
n = len(seq) 
for i in range(n-l) : 

if seq[i] > seq[i+l] : 
break 

else: 

return 



A check is performed before getting into the actual sorting: if the sequence is already sorted, the 
function simply returns. 



Note The optional else clause on a loop in Python is exeeuted if the loop has not been ended prematurely by a 
break statement. 



This means that no matter how inefficient our main sorting is, the running time will always be linear 
if the sequence is already sorted. No sorting algorithm can achieve linear running time in general, 
meaning that this “best-case scenario” is an anomaly — and ali of a sudden, we can’t reliably predict the 
running time anymore. The solution to this quandary is to be more specific. Instead of talking about a 
problem in general, we can specify the input more narrowly, and we often talk about one of three 
important cases: 

• The best case. This is the running time you get when the input is optimally suited 
to your algorithm. For example, if the input sequence to sort w check were sorted, 
we would get the best-case running time (which would be linear). 

• The worst case. This is usually the most useful case — the worst possible running 
time. This is useful because we normally want to be able to give some guarantees 
about the efficiency of our algorithm, and this is the best guarantee we can give in 
general. 

• The average case. This is a tricky one, and I’U avoid it most of the time, but in 
some cases it can be useful. Simply put, it’s the expected value of the running 
time, for random input (with a given probability distribution). 

In many of the algorithms we’ll be working with, these three cases have the same complexity. When 
they don’t, we’11 often be working with the worst case. Unless this is stated explicitly, however, no 
assumptions can be made about which case is being studied. In fact, we may not be restricting ourselves 
to a single kind of input at ali. What if, for example, we wanted to describe the running time of 
sort w check in generali This is stili possible, but we can’t be quite as precise. 

Let’s say the main sorting algorithm we’re using (after the check) is loglinear (that is, it has a running 
time of 0(« lg n )), which is very typical (and, in fact, optimal in the general case) for sorting algorithms. 
The best-case running time of our algorithm is then @{n) (when the check uncovers a sorted sequence), 
and the worst-case running time is &(n lg n). If we want to give a description of the running time in 
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general, however — for any kind of input — we cannot use the 0 notation at ali. There is no single function 
describing the running time; different types of inputs have different running time functions, and these 
have different asymptotic complexity, meaning we can't sum them up in a single 0 expression. 

The solution? Instead of the “twin bounds” of 0, we only supply an upper or lower limit, using O or 
Q. We can, for example, say that sort w check has a running time of 0(n lg n). This covers both the best 
and worst cases. Similarly, we could say it has a running time of Sl{n). Note that these limits are as tight 
as we can make them. 



Note It is perfectly acceptable to use either of our asymptotic operators to describe either of the three cases 
discussed here. We could very well say that the worst-case running time of sort_w_check is Cl(n lg n), for 
example, or that the best case is 0(n). 



Empirical Evaluation of Algorithms 

The main focus of this book is algorithm design (and its close relative, algorithm analysis ). There is, 
however, another important discipline of algorithmics that can be of vital importance when building 
real-world Systems, and that is algorithm engineering, the art of efficiently implementing algorithms. In a 
way, algorithm design can be seen as a way of achieving low asymptotic running time (by designing 
efficient algorithms), while algorithm engineering is focused on reducing the hidden constants in that 
asymptotic complexity. 

Although I may offer some tips on algorithm engineering in Python here and there, it can be hard to 
predict exactly which tweaks and hacks will give you the best performance for the specific problems 
you’re working on — or, indeed, for your hardware or version of Python. (These are exactly the kind of 
quirks asymptotics are designed to avoid.) And in some cases, such tweaks and hacks may not be needed 
at ali, because your program may be fast enough as it is. The most useful thing you can do in many cases 
is simply to try and see. If you have a tweak you think will improve your program, try it! Implement the 
tweak, and run some experiments. Is there an improvement? And if the tweak makes your code less 
readable and the improvement is small, is it really worth it? 



Note This section is about evaluating your programs, not on the engineering itself. For some hints on speeding 
up Python programs, see Appendix A. 



While there are theoretical aspects of so-called experimental algorithmics (that is, experimentally 
evaluating algorithms and their implementations) that are beyond the scope of this book, IT1 give you 
some practical starting tips that should get you pretty far. 

Tip 1: If possible, don’t worry about it. 

Worrying about asymptotic complexity can be very important. Sometimes, it’s the difference 
between a solution and what is, in practice, a raonsolution. Constant factors in the running time, 
however, are often not ali that critical. Try a straightforward implementation of your algorithm first, and 
see whether that’s good enough. (Actually, you might even try a naive algorithm first; to quote 
programming guru Ken Thompson, “When in doubt, use brute force.” Brute force, in algorithmics, 
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generally refers to a straightforward approach that just tries every possible solution, running time be 
damned!) If it works, it works. 

Tip 2: For timing things, use timeit. 

The timeit module is designed to perform relatively reliable timings. Although getting truly 
trustworthy results (such as those you’d publish in a scientific paper) is a lot of work, timeit can help 
you get “good enough in practice” timings very easily. For example: 

>>> import timeit 

>>> timeit. timeit("x = 2 + 2") 

0.034976959228515625 

>>> timeit. timeit("x = sum(range(lo))") 

0.92387008666992188 

The actual timing values you get will quite certainly not be exactly like mine. If you want to time a 
function (which could, for example, be a test function wrapping parts of your code), it may be even 
easier to use timeit from the shell command line, using the -m switch: 

$ python -m timeit -s"import mymodule as m" "m.myfunction()" 

There is one thing you should be very careful about when using timeit: avoid side effects that will 
affect repeated execution. The timeit function will run your code multiple times for increased precision, 
and if earlier executions affect later runs, you are probably in trouble. For example, if you time 
something like mylist .sort(), the list would get sorted only the firsl time. The other thousands of times 
the statement is run, the list will already be sorted, making your timings unrealistically low. The same 
caution would apply to anything involving generators or iterators that could be exhausted, for example. 
More details on this module and how it works can be found in the Standard library documentation. 9 

Tip 3: To find bottlenecks, use a profiler. 

It is a common practice to guess which part of your program needs optimization. Such guesses are 
quite often wrong. Instead of guessing wildly, let a profiler find out for you! Python comes with a few 
profiler variants, but the recommended one is cProfile. It’s as easy to use as timeit but gives more 
detailed information about where the execution time is spent. If your main function is main, you can use 
the profiler to run your program as follows: 

import cProfile 
cProfile. run ( ' main() ' ) 

This should print out timing results about the various functions in your program. If the cProfile 
module isn’t available on your system, use prof ile instead. Again, more information is available in the 
library reference. If you’re not so interested in the details of your implementation but just want to 
empirically examine the behavior of your algorithm on a given problem instance, the trace module in 
the Standard library can be useful — it can be used to count the number of times each statement is 
executed. 



9 http://docs.python.org/library 
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Tip 4: Plot your results. 

Visualization can be a great tool when figuring things out. Two common plots for looking at 
performance are graphs, 10 for example of problem size vs. running time, and box plots, showing the 
distribution of running times. See Figure 2-2 for examples of these. A great package for plotting things 
with Python is matplotlib (available from http://matplotlib.sf.net). 

Tip 5: Be careful when drawing conclusions based on timing comparisons. 

This tip is a bit vague, but that’s because there are so many pitfalls when drawing conclusions about 
which way is better, based on timing experiments. First, any differences you observe may be because of 
random variations. If you’re using a tool such as timeit, this is less of a risk, because it repeats the 
statement to be timed many times (and even runs the whole experiment multiple times, keeping the 
best run). Stili, there will be random variations, and if the difference between two implementations isn’t 
greater than what can be expected from this randomness, you can’t really conclude that they’re 
different. (You can’t conclude that they aren’t, either.) 



Note If you need to draw a conclusion when it’s a close call, you can use the statistical technique of hypothesis 
testing. However, for practical purposes, if the difference is so small you’re not sure, it probably doesn’t matter 
which implementation you choose, so go with your favorite. 




Figure 2-2. Visualizing running times for programs A, B, and C and problem sizes 1 0-50 



10 No, not the network kind, which is discussed later in this chapter. The other kind — plots of some measurement for 
every value of some parameter. 
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This problem is compounded if you’re comparing more than two implementations. The number of 
pairs to compare increases quadratically with the number of versions (as explained in Chapter 3), 
drastically increasing the chance that at least two of the versions will appear freakishly different, just by 
chance. (This is what’s called the problem of multiple comparisons .) There are statistical Solutions to this 
problem, but the easiest practicai way around it is to repeat the experiment with the two 
implementations in question. Maybe even a couple of times. Do they stili look different? 

Second, there are issues when comparing averages. At the very least, you should stick to comparing 
averages of actual timings. A common practice to get more meaningful numbers when performing 
timing experiments is to normalize the running time of each program, dividing it by the running time of 
some Standard, simple algorithm. This can indeed be useful but can in some cases make your results less 
than meaningful. See the paper “How not to lie with statistics: The correct way to summarize benchmark 
results” by Fleming and Wallace for a few pointers. For some other perspectives, you could read Bast and 
Weber’s “Don’t compare averages,” or the more recent paper by Citron et al., "The harmonic or 
geometric mean: does it really matter?” 

Third, your conclusions may not generalize. Similar experiments run on other problem instances or 
other hardware, for example, might yield different results. If others are to interpret or reproduce your 
experiments, it’s important that you thoroughly document how you performed them. 

Tip 6: Be careful when drawing conclusions about asymptotics from experiments. 

If you want to say something conclusively about the asymptotic behavior of an algorithm, you need 
to analyze it, as described earlier in this chapter. Experiments can give you hints, but they are by their 
nature finite, and asymptotics deal with what happens for arbitrarily large data sizes. On the other hand, 
unless you’re working in theoretical computer Science, the purpose of asymptotic analysis is to say 
something about the behavior of the algorithm when implemented and run on actual problem 
instances, meaning that experiments should be relevant. 

Suppose you suspect that an algorithm has a quadratic running time complexity, but you're unable 
to conclusively prove it. Can you use experiments to support your claim? As explained, experiments (and 
algorithm engineering) deal mainly with constant factors, but there is a way. The main problem is that 
your hypothesis isn’t really testable (through experiments). If you claim that the algorithm is, say, 0(n 2 ), 
no data can confirm or refute this. However, if you make your hypothesis more specific, it becomes 
testable. You might, for example, based on some preliminary results, believe that the running time will 
never exceed 0.24« 2 + O.ln + 0.03 seconds in your setup. (Perhaps more realistically, your hypothesis 
might involve the number of times a given operation is performed, which you can test with the trace 
module.) This is a testable (or, more specifically, refutable) hypothesis. If you run lots of experiments and 
you aren’t able to find any counter-examples, that supports your hypothesis to some extent. The neat 
thing is that, indirectly, you’re also supporting the claim that the algorithm is 0(n 2 ). 



Implementing Graphs and Trees 

The first example in Chapter 1, where we wanted to navigate Sweden and China, was typical of problems 
that can expressed in one of the most powerful frameworks in algorithmics — that of graphs. In many 
cases, if you can formulate what you’re working on as a graph problem, you’re (at least) halfway to a 
solution. And if your problem instances are in some form expressible as trees, you stand a good chance 
of having a really efficient solution. 

Graphs can represent all kinds of structures and Systems, from transportation networks to 
communication networks and from protein interactions in cell nuclei to human interactions Online. You 
can increase their expressivity by adding extra data such as weights or distances, making it possible to 
represent such diverse problems as playing chess or matching a set of people to as many jobs, with the 
best possible use of their abilities. Trees are just a special kind of graphs, so most algorithms and 
representations for graphs will work for them as well. However, because of their special properties (they 
are connected and have no cycles), some specialized (and quite simple) versions of both the 
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representations and algorithms are possible. There are plenty of practical structures (such as XML 
documents or directory hierarchies) that can be represented as trees 11 — this “special case” is actually 
quite general. 

If your memory of graph nomenclature is a bit rusty (or if this is ali new to you), take a look at 
Appendix C, “Graph Terminology.” Here are the highlights in a nutshell: 

• A graph G = (V, E) consists of a set of nodes, V, and edges between them, E. If the 
edges have a direction, we say the graph is directed. 

• Nodes with an edge between them are adjacent. The edge is then incident to both. 

The nodes that are adjacent to v are the neighbors of v. 

• A subgraph of G= (V,E) consists of a subset of V and a subset of E. A path in G is a 
subgraph where the edges connect the nodes in a sequence, without revisiting any 
node. A cycle is like a path, except that the last edge links the last node to the first. 

• If we associate a weight vvith each edge in G, we say that G is a weighted graph. The 
length of a path or cycle is the sum of its edge weights, or, for unweighted graphs, 
simply the number of edges. 

• Aforest is a cycle-free graph, and a connected graph is a tree. In other words, a 
forest consists of one or more trees. 

While phrasing your problem in graph terminology gets you far, if you want to implement a 
solution, you need to represent the graphs as data structures somehow. (This, in fact, applies even if you 
just want to design an algorithm, because you must know what the running times of different operations 
on your graph representation will be.) In some cases, the graph will already be present in your code or 
data, and no separate structure will be needed. For example, if you’re writing a web crawler, 
automatically collecting information about web sites by following links, the graph is the web itself. If you 
have a Person class with a friends attribute, which is a list of other Person instances, then your object 
model itself is a graph on which you can run various graph algorithms. There are, however, specialized 
ways of implementing graphs. 

In abstract terms, what we are generally looking for is a way of implementing the neighborhood 
function, N{v), so that N[v] is some form of Container (or, in some cases, merely an iterable object) of the 
neighbors of v. Like so many other books on the subject, I will focus on the two most well-known 
representations, adjacency lists and adjacency matrices, because they are highly useful and general. For a 
discussion of alternatives, see the section “A multitude of Representations” later in this chapter. 



11 With IDREFs and symlinks, respectively, XML documents and directory hierarchies are actually general graphs. 
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BLACK BOX: DICT AND SET 



One technique covered in detail in most algorithm books, and usually taken for granted by Python 
programmers, is hashing. Hashing involves computing some (often seemingly random) integer value from 
an arbitrary object. This value can then be used, for example, as an index into an array (subject to some 
adjustments to make it fit the index range). 

The Standard hashing mechanism in Python is available through the hash function: 

>>> hash(42) 

42 

>>> hash("Hello, world!") 

-1886531940 

This is the mechanism that is used in dictionaries, which are implemented using so-called hash tables. 
Sets are implemented using the same mechanism. The important thing is that the hash value can be 
constructed in essentially constant time (it’s constant with respect to the hash table size but linear as a 
function of the size of the object being hashed). If the array that is used behind the scenes is large enough, 
accessing it using a hash value is also 0(1) in the average case. (The worst-case behavior is Q[n), unless 
we know the values beforehand and can write a custom hash function. Stili, hashing is extremely efficient 
in practice.) 

What this means to us is that accessing elements of a dict or set can be assumed to take constant 
(expected) time, which makes them highly useful building blocks for more complex structures and 
algorithms. 



Adjacency Lists and the Like 

One of the most intuitive ways of implementing graphs is using adjacency lists. Basically, for each node, 
we can access a list (or set or other Container or iterable) of its neighbors. Let’s take the simplest way of 
implementing this, assuming we have n nodes, numbered 0 .../ 2 - 1 . 



Note Nodes can be any objects, of course, or have arbitrary labeis or names. Using integers in the range 
0.../7-1 can make many implementations easier, though, because the node numbers can easily be used 
as indices. 



Each adjacency (or neighbor) list is then just a list of such numbers, and we can place the lists 
themselves into a main list of size n, indexable by the node numbers. Usually, the ordering of these lists 
is arbitrary, so we’re really talking about using lists to implement adjacency sets. The term list in this 
context is primarily historical. In Python we’re lucky enough to have a separate set type, which in many 
cases is a more natural choice. 

For an example that will be used to illustrate the various graph representations, see Figure 2-3. To 
begin with, assume that we have numbered the nodes (a = 0, b = 1, ...). The graph can then be 
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represented in a straightforward manner, as shown in Listing 2-1. Just as a convenience, I have assigned 
the node numbers to variables with the same names as the node labeis in the figure. You can, of course, 
just work with the numbers directly. Which adjacency list belongs to which node is indicated by the 
comments. If you want, take a minute to confirm that the representation does, indeed, correspond to 
the figure. 

Listing2-1. A Straightforward Adjacency Set Representation 

a, b, c, d, e, f, g, h = range(8) 

N = [ 



(b, c , d, e, f}, 


# a 


(c, e}. 


# b 


(d). 


# c 


(e). 


# d 


{f}. 


# e 


(c, g, h), 


# f 


{f, h). 


# g 


{f, gl 


# h 



] 



Note In Python versions prior to 2.7 (or 3.0), you would write set literals as set ([i, 2 , 3 ]) rather than 
{ 1 , 2 , 3 }. Note that an empty set is stili written set(), because {} is an empty dict. 




Figure 2-3. A sample graph used to illustrate various graph representations 
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The name N has been used here to correspond with the N function discussed earlier. In graph 
theory, N(v) represents the set of v’s neighbors. Similarly, in our code, N[v] is now a set of v’s 
neighbors. Assuming you have defined N as earlier in an interactive interpreter, you can now play 
around with the graph: 

>>> b in N[a] # Neighborhood membership 
True 

>>> len(N[f]) # Degree 
3 



Tip If you have some code in a source file, such as the graph definition in Listing 2-1 and you want to explore it 
interactively, as in the previous example, you can run python with the -i switch, like this: 

python -i listing_2_l.py 

This will run the source file and start an interactive interpreter that continues where the source file left of, with any 
global definitions available for your experimentation. 



Another possible representation, which can have a bit less overhead in some cases, is to replace the 
adjacency sets with actual adjacency lists. For an example of this, see Listing 2-2. The same operations 
are now available, except that membership checking is now 0(n). This is a significant slowdown, but that 
is only a problem if you actually need it, of course. (If all your algorithm does is iterate over neighbors, 
using set objects would not only be pointless; the overhead would actually be detrimental to the 
constant factors of your implementation.) 



Listing 2-2. Adjacency Lists 



a, b, c, d, e, 



N = [ 



[b, c, d, 
[c, e], 

[d] , 

[e] , 

[f] , 

[c, g, h], 
[f> h], 

[f, g] 



g; 

e > f]> 



h = range(8) 

# a 

# b 

# c 

# d 

# e 

# f 

# g 

# h 



It might be argued that this representation is really a collection if adjacency arrays, rather than 
adjacency lists in the classical sense, because Python’s list type is really a dynamic array behind the 
covers (see earlier black box sidebar about list) . If you wanted, you could implement a linked list type 
and use that, rather than a Python list. That would allowyou (asymptotically) cheaper inserts at arbitrary 
points in each list, but this is an operation you probably will not need, because you can just as easily 
append new neighbors at the end. The advantage of using list is that it is a well-tuned, very fast data 
structure (as opposed to any list structure you could implement in pure Python). 

A recurring theme when working with graphs is that the best representation depends on what you 
need to do with your graph. For example, using adjacency lists (or arrays) keeps the overhead low and 
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lets you efficiently iterate over N(v) for any node v. However, checking whether u and v are neighbors is 
which can be problematic if the graph is dense (that is, if it has many edges). In these cases, 
adjacency sets may be the way to go. 



Tip We’ve also seen that deleting objects from the middle of a Python list is costly. Deleting trom the end of a 
list takes constant time, though. If you don’t care about the order of the neighbors, you can delete arbitrary 
neighbors in constant time by overwriting them with the one that is currently last in the adjacency list, before 
calling the pop method. 



A slight variation on this would be to represent the neighbor sets as sorted lists. If you aren’t 
modifying the lists much, you can keep them sorted and use bisection (see the black box sidebar on 
bisect in Chapter 6) to check for membership, which might lead to slightly less overhead (in terms of 
memory use and iteration time) but would lead to a membership check complexity of 0(lg k), where k is 
the number of neighbors for the given node. (This is stili very low. In practice, though, using the built-in 
set type is a lot less hassle.) 

Yet another minor tweak on this idea is to use dicts instead of sets or lists. The neighbors would then 
be keys in this dict, and you'd be free to associate each neighbor (or out-edge) with some extra value, 
such as an edge weight. How this might look is shown in Listing 2-3 (with arbitrary edge weights added). 



Listing 2-3. Adjacency dicts with Edge Weights 



a, b, c, d, e, f, g, h = range(8) 
N = [ 

{ b : 2 , c:l, d : 3 j e:9, f:4}, 
{c:4, e:3}, 

{d:8}, 

{e:7}, 

{f:5}, 

{ c : 2 , g:2, h:2}, 

{f:l, h:6}, 

{f:9, g:8} 



# a 

# b 

# c 

# d 

# e 

# f 

# g 

# h 



The adjacency dict version can be used just like the others, with the additional edge weight 
functionality: 



>>> b in N [ a ] 
True 

>>> len(N[fj) 
3 

>>> N [ a ] [ bj 
2 



# Neighborhood membership 

# Degree 

# Edge weight for (a, b) 
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If you want, you can use adjacency dicts even if you don 't have any useful edge weights or the like, of 
course (using, perhaps, None, or some other placeholder instead). This would give you the main 
advantages of the adjacency sets, but it would also work with (very, very) old versions of Python, which 
don’t have the set type. 12 

Until now, the main collection containing our adjacency structures — be they lists, sets, or dicts — has 
been a list, indexed by the node number. A more flexible approach (allowing us to use arbitrary, 
hashable, node labeis) is to use a dict as this main structure. 13 Listing 2-4 shows what a dict containing 
adjacency sets would look like. Note that nodes are now represented by characters. 



Listing 2-4. A Dict witli Adjacency Sets 



' a ' : set( ' bcdef ' ), 
' b ' : set( ' ce' ) , 

'c' : set( ' d ' ), 

' d ' : set( 'e' ), 

'e' : set( 'f ' ), 

'f' = set('cgh'), 
'g' = set( 'fh ' ) , 

'h' : set( 'fg' ) 



Note If you drop the set constructor in Listing 2-4, you end up with adjacency strings, which would work as 
well as (immutable) adjacency lists of characters (with slightly lower overhead). A seemingly silly representation, 
but as l’ve said before, it depends on the rest of your program. Where are you getting the graph data from? (Is it 
already in the form of text, for example?) How are you going to use it? 



Adjacency Matrices 

The other common form of graph representation is the adjacency matrix. The main difference is the 
following: instead of listing ali neighbors for each node, we have one row (an array) with one position for 
each possible neighbor (that is, one for each node in the graph), and store a value (such as True or False), 
indicating whether that node is indeed a neighbor. Again, the simplest implementation is achieved using 
nested lists, as shown in Listing 2-5. Note that this, again, requires the nodes to be numbered from 0 to 
V-l. The truth values used are 1 and 0 (rather than True and False), simply to make the matrix more 
readable. 



12 Sets were introduced in Python 2.3, in the form of the sets module. The built-in set type has been available since 
Python 2.4. 

13 This, a dictionary with adjacency lists, is what Guido van Rossum uses in his article “Python Patterns — 
ImplementingGraphs,” which is found Online at http://www.python.org/doc/essays/graphs.html 
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Listing2-5. An Adjacency Matrix, Implemented with Nested Lists 
a, b, c, d, e, f, g, h = range(8) 

# abcdefgh 
N = [[0,1,1,1,1,1,0,01, # a 

[o,o,i,o,i,o,o,o], # b 
[o,o,o,i,o,o,o,o], # c 
[0,0,0,0,1,0,0,01, # d 
[o,o,o,o,o,i,o,o], # e 
[0,0,1,0,0,0,1,11, # -f 

[0,0,0,0,0,1,0,11, # g 

[o,o,o,o,o,i,i,oH # h 

The way we’d use this is slightly different from the adjacency lists/sets. Instead of checking whether 
b is in N [ a ] , you would check whether the matrix cell N [ a ] [ b ] is true. Also, you can no longer use 
len(N[a] ) to find the number of neighbors, because ali rows are of equal length. Instead, use sum: 

>>> N [ a ] [ b] # Neighborhood membership 

1 

>>> sum(N[f]) # Degree 
3 



Adjacency matrices have some useful properties that are worth knowing about. First, as long as we 
aren’t allowing self-loops (that is, we’re not working with pseudographs), the diagonal is ali false. Also, 
we often implement undirected graphs by adding edges in both directions to our representation. This 
means that the adjacency matrix for an undirected graph will be symmetric. 

Extending adjacency matrices to allow for edge weights is trivial: instead of storing truth values, 
simply store the weights. For an edge [u, v), let N [ u ] [v] be the edge weight w(u, v) instead of True. Often, 
for practical reasons, we let nonexistent edges get an infinite weight. (This is to guarantee that they will 
not be included in, say, shortest paths, as long as we can find a path along existent edges.) It isn’t 
necessarily obvious how to represent infinity, but we do have some options. 

One possibility is to use an illegal weight value, such as None, or -1 if ali weights are known to be 
non-negative. Perhaps more useful in many cases is using a really large value. For integral weights, you 
could use sys . maxint, even though it’s not guaranteed to be the greatest possible value (long ints can be 
greater). There is, however, one value that is designed to represent infinity among floats: inf. It’s not 
available directly under that name in Python, but you can get it with the expression f loat ( ' inf ' ) . 14 

Listing 2-6 shows what a weight matrix, implemented with nested lists, might look like. The same 
weights as in Listing 2-3 are used. 

I have named the infinity value here underscore (_), because it’s short, unintrusive, and visually 
distinet. Use whatever name you prefer, of course. Note also that the diagonal is stili all zero, because 
even though we have no self-loops, weights are often interpreted as a form of distance, and the distance 
from a node to itself is customarily zero. 



14 This expression is guaranteed to work from Python 2.6 onward. In earlier versions, special floating-point values 
were platform-dependent, although float( ' inf' ) or float( ' Inf' ) should work on most platforms. 
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Listing2-6. A Weight Matrix with Infinite Weightfor Missing Edges 

a, b, c, d, e, f, g, h = range(8) 

= float( ' inf ' ) 

# abcdefgh 

W = [[0,2,1,3,9,4,_,J, # a 
[_,0,4,_,3,_,_,J, # b 
L.>_.>0,8, # c 
[-,->->0,7, _,_>_], # d 
[->->->->0,S,_,J, # e 
# f 

1,0, 6], # g 

L, 9, 8,0]] # h 

Weight matrices make it easy to access edge weights, of course, but membership checking and 
finding the degree of a node, for example, or even iterating over neighbors must be done a bit differently 
now. You need to take the infinity value into account-for example, like this (using inf = float( ' inf' ) for 
more readable code): 

>>> W [ a ] [ b ] < inf # Neighborhood membership 
True 

>>> W[c][e] < inf # Neighborhood membership 
False 

>>> sum(l for w in W[a] if w < inf) - 1 # Degree 
5 



Note that 1 is subtracted from the degree sum, because we don’t want to count the diagonal. The 
degree calculation here is &(n), whereas both membership and degree could easily be found in constant 
time with the proper structure. Again, you should always keep in mind howyou are going to use your 
graph and represent it accordingly. 



SPECIAL-PURPOSE ARRAYS WITH NUMPY 



The NumPy library has a lot of functionality related to multidimensional arrays. We don’t really need much 
of that for graph representation, but the NumPy array type is quite useful, for example, for implementing 
adjacency or weight matrices. 

Where an empty list-based weight or adjacency matrix for n nodes is created, for example, like this 

>>> N = [[0]*10 for i in range(io)] 

in NumPy, you can use the zeros function: 

>>> import numpy as np 
>>> N = np.zeros([l0,l0]) 

The individual elements can then be accessed using comma-separated indices, as in a[u,v]. To access 
the neighbors of a given node, you use a single index, as in A[ u ] . 

The NumPy package is available from http://numpy.scipy.org. 
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Note that you need to get a version of NumPy that will work with your Python version. If the most recent 
release of NumPy has not yet “caught up” with the Python version you want to use, you can compile and 
install directly from the source repository. You can get the source with the following command (assuming 
you have Subversion installed): 

svn co http://svn.scipy.org/svn/numpy/trunk numpy 

You can find more information about how to compile and install NumPy, as well as detailed documentation 
on its use, on the web site. 



Implementing Trees 

Any generat graph representation can certainly be used to represent trees, because trees are simply a 
special kind of graphs. However, trees play an important role on their own in algorithmics, and many 
special-purpose tree structures have been proposed. Most tree algorithms (even operations on search 
trees, discussed in Chapter 6) can be understood in terms of general graph ideas, but the specialized tree 
structures can make them easier to implement. 

It is easiest to specialize the representation of rooted trees, where each edge is pointed downward, 
away from the root. Such trees often represent hierarchical partitionings of a data set, where the root 
represents ali the objects (which are, perhaps, kept in the leaf nodes), while each internal node 
represents the objects found as leaves in the tree rooted at that node. You can even use this intuition 
directly, making each subtree a list containing its child subtrees. Consider the simple tree shown in 
Figure 2-4. 

We could represent that tree with lists of lists, like this: 

»> T = [["a", "b"], ["c"], ["d", ["e", "f "] ] ] 

»> T[0] [1] 

' b ' 

»> T [ 2 ] [ 1 ] [ 0 ] 

'e' 




Figure 2-4. A sample tree with a highlighted pathfrom the root to a leaf 
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Each list is, in a way, a neighbor (or child) list of the (anonymous) internal nodes. In the second 
example, we access the third child of the root, the second child of that child, and finally the first child of 
that (path highlighted in the figure). 

In some cases, we may know the maximum number of children allowed in any internal node. (For 
example, a binary tree is one where each internal node has a maximum of two children.) We can then 
use other representations, even objects with an attribute for each child, as shown in Listing 2-7. 

Listing2-7. A Binaiy Tree Class 
class Tree: 

def init (self, left, right): 

self.left = left 
self. right = right 

You can use the Tree class like this: 

>>> t = Tree(Tree("a", "b"), Tree("c", "d")) 

>>> t. right. left 
'c' 

You can, for example, use None to indicate missing children (for example, if a node has only one 
child). You are, of course, free to combine techniques such as these to your heart’s content (for example, 
using a child list or child set in each node instance). 

A common way of implementing trees, especially in languages that don’t have built-in lists, is the 
“first child, next sibling” representation. Here, each tree node has two “pointers,” or attributes 
referencing other nodes, just like in the binary tree case. However, the first of these refers to the first 
child of the node, while the second refers to its next sibling (as the name implies). In other words, each 
tree node refers to a linked list of siblings (its children), and each of these siblings refers to a linked list of 
its own. (See the black box sidebar on list, earlier in this chapter, for a brief intro to linked lists.) Thus, a 
slight modification of the binary tree in Listing 2-7 gives us a multiway tree, as shown in Listing 2-8. 

Listing 2-8. A Multiway Tree Class 
class Tree: 

def init (self, kids, next=None): 

self.kids = self. val = kids 
self. next = next 

The separate val attribute here is just to have a more descriptive name when supplying a value 
(such as ' c ' ) instead of a child node. Feel free to adjust this as you want, of course. Here’s an example of 
howyou can access this structure: 

>>> t = Tree(Tree("a", Tree("b", Tree("c", Tree("d"))))) 

>>> t. kids. next. next. val 
'c' 

And here’s what that tree looks like: 
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The kids and next attributes are drawn as dotted arrows, while the (implicit) edges of the trees are 
drawn solid. Note that I’ve cheated a bit, and not drawn separate nodes for the strings "a", "b", and so 
on; instead, 1’ve treated them as labeis on their parent nodes. In a more sophisticated tree structure, you 
might have a separate value field in addition to kids, instead of using the one attribute for both purposes. 

Normally, you’d probably use more elaborate code (involving loops or recursion) to traverse the tree 
structure than the hard-coded path in this example. More on that in Chapter 5. In Chapter 6, you’ll also 
see some discussion about multiway trees and tree balancing. 



THE BUNCH PATTERN 



When prototyping (or even finalizing) data structures such as trees, it can be useful to have a flexible class 
that will allow you to specify arbitrary attributes in the constructor. In these cases, the “Bunch” pattern 
(named by Alex Martelli in the Python CookbooK) can come in handy. There are many ways of 
implementing it, but the gist of it is the following: 

class Bunch(dict): 

def init (self, *args, **kwds): 

super (Bunch, self). init (*args, **kwds) 

self. dict = self 

There are several useful aspects to this pattern. First, it lets you create and set arbitrary attributes by 
supplying them as command-line arguments: 

>>> x = Bunch(name=" Jayne Cobb", position="Public Relations") 

>>> x.name 
'Tayne Cobb' 

Second, by subclassing dict, you get lots of functionality for free, such as iterating over the keys/attributes 
or easily checking whether an attribute is present. Here’s an example: 

>>> T = Bunch 

>>> t = T(left=T(left="a", right="b"), right=T(left="c")) 

>>> t.left 

{ ' xight ' : 'b', 'left': 'a'} 

>>> t.left.right 
' b ' 

>>> t[ ' left' ] [ ' right ' ] 

' b ' 

>>> "left" in t. right 
True 

>>> "right" in t. right 
False 

This pattern isn’t useful only when building trees, of course. You could use it for any situation where you’d 
want a flexible object whose attributes you could set in the constructor. 
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A Multitude of Representations 

Even though there are a host of graph representations in use, most students of algorithms learn only the 
two types covered (with variations) so far in this chapter. Jeremy P. Spinrad writes, in his book Efficient 
Graph Representations, that most introductory texts are “particularly irritating” to him as a researcher in 
computer representations of graphs. Their formal definitions of the most well-known representations 
(adjacency matrices and adjacency lists) are mostly adequate, but the more general explanations are 
often faulty. He presents, based on misstatements from several texts, the following strawman’s 15 
comments on graph representations: 

There are two methods for representing a graph in a computer; adjacency matrices, 
and adjacency lists. It is faster to work with adjacency matrices, but they use more 
space than adjacency lists, so you will choose one or the other depending on which 
resource is more important toyou. (p. 9) 

These statements are problematic in several ways, as Spinrad points out. First, there are many 
interesting ways of representing graphs, not just the two listed here. For example, there are edge lists (or 
edge sets), which are simply lists containing all edges as node pairs (or even special edge objects); there 
are incidence matrices, indicating which edges are incident on which nodes (useful for multigraphs); and 
there are specialized methods for graph types such as trees (described earlier) and interval graphs (not 
discussed here). Take a look at Spinrad's book for more representations than you will probably ever 
need. Second, the idea of space/time trade-off is quite misleading: there are problems that can be solved 
faster with adjacency lists than with adjacency arrays, and for random graphs, adjacency lists can 
actually use more space than adjacency matrices. 

Rather than relying on simple, generalized statements such as the previous strawman’s comments, 
you should consider the specifics of your problem. The main criterion would probably be the asymptotic 
performance for what you’re doing. For example, looking up the edge ( u , v) in an adjacency matrix is 
0(1), while iterating over v’s neighbors is 0(n); in an adjacency list representation, both operations will 
be 0(d(v)), that is, on the order of the number of neighbors the node has. If the asymptotic complexity of 
your algorithm is the same regardless of representation, you could perform some empirical tests, as 
discussed earlier in this chapter. Or, in many cases, you should simply choose the representation that 
makes your code ciear and easily maintainable. 

An important type of graph implementation not discussed so far is more of a nonrepresentation: 
many problems have an inherent graphical structure — perhaps even a tree structure — and we can apply 
graph (or tree) algorithms to them without explicitly constructing a representation. In some cases, this 
happens when the representation is external to our program. For example, when parsing XML 
documents or traversing directories in the file system, the tree structures are just there, with existing 
APIs. In other cases, we are constructing the graph ourselves, but it is implicit. For example, if you want 
to find the most efficient solution to a given configuration of Rubik’s Cube, you could define a cube 
state, as well as operators for modifying that state. Even though you don’t explicitly instantiate and store 
all possible configurations, the possible States form an implicit graph (or node set), with the change 
operators as edges. You could then use an algorithm such as A* or Bidirectional Dijkstra (both discussed 
in Chapter 9) to find the shortest path to the solved state. In such cases, the neighborhood function N{v) 
would compute the neighbors on the fly, possibly returning them as a collection or some other form of 
iterable object. 

The final kind of graph 1T1 touch upon in this chapter is the subproblem graph. This is a rather deep 
concept that I’ll revisit several times, when discussing different algorithmic techniques. In short, most 
problems can be decomposed into subproblems: smaller problems that often have quite similar 



15 That is, the comments are inadequate and are presented to demonstrate the problem with most explanations. 
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structure. These form the nodes of the subproblem graph, and the dependencies (that is, which 
subproblems depend on which) form the edges. Although we rarely apply graph algorithms directly to 
such subproblem graphs (they are more of a conceptual or mental tool), they do offer significant insights 
into such techniques as divide and conquer (Chapter 6) and dynamic programming (Chapter 8). 



GRAPH LIBRARIES 



The basic representation techniques described in this chapter will probably be enough for most of your 
graph algorithm coding, especially with some customization. However, there are some advanced 
operations and manipulations that can be tricky to implement, such as temporarily hiding or combining 
nodes, for example. There are some third-party libraries out there that take care of some of these things, 
and some of them are even implemented as C extensions, potentially leading to performance increase as a 
bonus. They can also be quite convenient to work with, and some of them have several graph algorithms 
available out of the box. While a quick web search will probably turn up the most actively supported graph 
libraries, here are a few to get you started: 

• NetworkX: http://networkx.lanl.gov 

• python-graph: http://code.google.eom/p/python-graph 

• Graphine: http://gitorious.org/projects/graphine/pages/Home 

There is also Pygr, a graph database (http://bioinfo.mbi.ucla.edu/pygr); Gato, a graph animation 
toolbox (http://gato.sourceforge.net); and PADS, a collection of graph algorithms 
(http : //www. ics . uci . edu/~eppstein/PADS). 



Beware of Black Boxes 

While algorists generally work at a rather abstract level, actually implementing your algorithms takes 
some care. When programming, you’re bound to rely on components that you did not write yourself, 
and relying on such “black boxes” without any idea of their contents is a risky business. Throughout this 
book, you'11 find sidebars marked “Black Box,” briefly discussing various algorithms available as part of 
Python, either built into the language or found in the Standard library. I've included these because I 
think they’ re instructive; they teli you something about how Python works, and they give you glimpses of 
a few more basic algorithms. 

However, these are not the only black boxes you’11 encounter. Not by a long shot. Both Python and 
the machinery it rests on use many mechanisms that can trip you up if you’re not careful. In general, the 
more important your program, the more you should mistrust such black boxes and seek to find out 
what’s going on under the cover. 1’11 show you two traps that it’s important that you’re aware of in the 
following sections, but if you take nothing else away from this section, remember the following: 

• When performance is important, rely on actual profiling rather than intuition. You 
may have hidden bottlenecks, and they may be nowhere near where you suspect 
they are. 

• When correctness is critical, the best thing you can do is calculate your answer 
more than once, using separate implementations (preferably written by separate 
programmers). 
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The latter principle of redundancy is used in many performance-critical Systems and is also one of 
the key pieces of advice given by Foreman S. Acton in his book Real Computing Made Real, on 
preventing calculating errors in scientific and engineering Software. Of course, in every scenario, you 
have to weigh the costs of correctness and performance against their value. (For example, as I said 
before, if your program is fast enough, there’s no need to optimize it.) 

The following two sections deal with two rather different topics. The first is about hidden 
performance traps: operations that seem innocent enough, but that can turn a linear operation into a 
quadratic one. The second is about a topic that is not often discussed in algorithm books, but it is 
important to be aware of, that is, the many traps of computing with floating-point numbers. 



Hidden Squares 

Consider the following two ways of looking for an element in a list: 

>>> from random import randrange 

>>> L = [randrange(ioooo) for i in range(iooo) ] 

»> 42 in L 
False 

>>> S = set(L) 

»> 42 in S 
False 

They’re both pretty fast, and it might seem pointless to create a set from the list — unnecessary work, 
right? Well, it depends. If you’re going to do many membership checks, it might pay off, because 
membership checks are linear for lists and constant for sets. What if, for example, you were to gradually 
add values to a collection and for each step check whether the value was already added? This is a 
situation you’11 encounter repeatedly throughout the book. Using a list would give you quadratic 
running time, whereas using a set would be linear. A huge difference. The lesson is that it’s important to 
pick the right built-in data structure for the job. 

The same holds for the example discussed earlier, about using a deque rather than inserting objects 
at the beginning of a list. But there are some examples that are less obvious that can cause just as many 
problems. Take, for example, the “obvious” way of gradually building a string: 

>>> s = "" 

>>> for chunk in input(): 
s += chunk 

It works, and because of some really elever optimizations in Python, it actually works pretty well, up 
to a certain size — but then the optimizations break down, and you run smack into quadratic growth. The 
problem is that (without the optimizations) you need to create a new string for every += operation, 
copying the contents of the previous one. You’11 see a detailed discussion of why this sort of thing is 
quadratic in the next chapter, but for now, just be aware that this is risky business. A better solution 
would be the following: 

>>> chunks = [] 

>>> for chunk in input(): 

chunks. append( chunk) 

>>> s = ' ' . join(chunks) 
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You could even simplify this further: 

>>> s = ' ' . join(inputQ) 

This version is efficient for the same reason that the earlier append examples were. Appending 
allows you to overallocate with a percentage so that the available space grows exponentially, and the 
append cost is constant when averaged (amortized) over ali the operations. 

There are, however, quadratic running times that manage to hide even better than this. Consider 
the following solution, for example: 

>>> s = sum(input() j ' ' ) 

Python complains and asks you to use ' ' . joinQ instead (and rightly so). But what if you’re usinglists? 
>>> resuit = sum(lists, []) 

This works, and it even looks rather elegant, but it really isn’t. You see, under the covers, the sum 
function doesn’t know ali too much about what you’re summing, and it has to do one addition after 
another. That way, you’re right back at the quadratic running time of the += example for strings. Here’s a 
better way: 

>>> res = [] 

>>> for lst in lists: 
res .extend(lst) 

Just try timing both versions. As long as lists is pretty short, there won’t be much difference, but it 
shouldn’t take long before the sum version is thoroughly beaten. 



The Trouble with Floats 

Most real numbers have no exact finite representation. The marvelous invention of floating-point 
numbers makes it seem like they do, though, and even though they give us a lot of computing power, 
they can also trip us up. Big time. In the second volume of The Art of Computer Programming, Knuth 
says, “Floating point computation is by nature inexact, and programmers can easily misuse it so that the 
computed answers consist almost entirely of ‘noise’.” 16 

Python is pretty good at hiding these issues from you, which can be a good thing if you’re seeking 
reassurance, but it may not help you figure out what’s really going on. For example, in current version of 
Python, you'll get the following reasonable behavior: 

»> 0.1 
0.1 



It certainly looks like the number 0.1 is represented exactly. Unless you know better, it would 
probably surprise you to learn that it’s not. Try an earlier version of Python (say, 2.6), where the black 
box was slightly more transparent: 

>>> 0.1 

0.10000000000000001 



16 This kind of trouble has led to disaster more than once (see, for example, http://www.ima.umn.edu/~arnold/ 
455 . f 96/disasters . html) . 
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Now we’re getting somewhere. Let’s go a step further (feel free to use an up-to-date Python here): 

>>> sum(o.l for i in range(io)) == 1.0 
False 

Ouch! Not what you’d expect without previous knowledge of floats. 

The thing is, integers can be represented exactly in any number system, be it binary, decimal, or 
something else. Real numbers, though, are a bit trickier. The official Python tutorial has an excellent 
section on this, 17 and David Goldberg has written an great (and thorough) tutorial paper. The basic idea 
should be easy enough to grasp if you consider how you’d represent 1 /3 as a decimal number. You can’t 
do it exactly, right? If you were using the ternary number system, though (base 3) , it would be easily 
represented as 0.1. 

The first lesson here is to never compare floats for equality. It generally doesn’t make sense. Stili, in 
many applications (such as computational geometry), you'd very much like to do just that. Instead, you 
should check whether they are approximately equal. For example, you could take the approach of 
assertAlmostEqual from the unittest module: 

>>> def almost_equal(x, y, places=7): 

return round(abs(x-y), places) == 0 

>>> almost_equal(sum(o.l for i in range(io)), 1.0) 

True 



There are also tools you can use if you need exact decimal floating-point numbers, for example the 
decimal module: 

>>> from decimal import * 

>>> sum(Decimal("0.l") for i in range(io)) == Decimal("l.O") 

True 



This module can be essential if you’re working with financial data, for example, where you need 
exact calculations with a certain number of decimals. In certain mathematical or scientific applications, 
you might find tools such as Sage useful: 18 

sage: 3/5 * 11/7 + sqrt(5239) 

13*sqrt(3l) + 33/35 

As you can see, Sage does its math symbolically, so you get exact answers (although you can also get 
decimal approximations, if needed). This sort of symbolic math (or the decimal module) is nowhere near 
as efficient as using the built-in hardware capabilities for floating-point calculations, though. 

If you find yourself doing floating-point calculations where accuracy is key (that is, you’re not just 
sorting them or the like), a good source of information is Acton’s book, mentioned earlier. Let’s just 
briefly look at an example of his: you can easily lose significant digits if you subtract two nearly equal 
subexpressions. To achieve higher accuracy, you’11 need to rewrite your expressions. Consider, for 
example, the expression sqrt(x+l)-sqrt(x), where we assume that x is very big. The thing to do would 
be to get rid of the risky subtraction. By multiplying and dividing by sqrt(x+l)+sqrt(x), we end up with 
an expression that is mathematically equivalent to the original but where we have eliminated the 
subtraction: l.O/(sqrt(x+l)+sqrt(x)). Let’s compare the two versions: 



17 http : //docs . python . org/tutorial/f loatingpoint . html 

18 Sage is a tool for mathematical computation in Python and is available from http://sagemath.org. 
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>>> from math import sqrt 
»> x = 8762348761.13 
>>> sqrt(x + l) - sqrt(x) 

5.341455107554793e-06 

>>> l.O/(sqrt(x + l) + sqrt(x)) 

5 . 34145 7002623 76g6e- 06 

As you can see, even though the expressions are equivalent mathematically, they give different 
answers (with the latter being more accurate). 



A QUICK MATH REFRESHER 



If you’re not entirely comfortable with the formulas used in Table 2-1 , here is a quick rundown of what 
they mean: A power, like x y (xto the power of y) is basically x times itselfy times. More precisely, xoccurs 
as a factor y times. Here, xis called the base, and y is the exponent (or sometimes the power). So, for 
example, 3 2 = 9. Nested powers simply have their exponents multiplied: (3 2 ) 4 = 3 8 . In Python, you write 
powers as x**y. 

A polynomial is just a sum of several powers, each with its own constant factor. For example, 

9x 5 + 2x 2 + x+ 3. 

You can have fractional powers, too, as a kind of inverse: (x y ) 1/y = x. These are sometimes called roots, 
such as the square root for the inverse of squaring. In Python you can get square roots either using the 
sqrt function from the math module or simply using x**o . 5 . 

Roots are inverses in that they “undo” the effects of powers. Logarithms are another kind of inverse. Each 
logarithm has a fixed base; the most common one in algorithmics is the base-2 logarithm, written log 2 or 
simply Ig. (The base-10 logarithm is conventionally written simply log, while the so-called natural 
logarithm, with base e, is written In). The logarithm gives us the exponent we need (for the given base), so 
if n = 2 k , then Ig n = k. In Python, you can use the log function of the math module to get logarithms. 

The factorial, or n\, is calculated as n x (n- 1) x (n- 2) ... 1. It can be used, among otherthings, to 
calculate the number of possible orderings of n elements. (There are n possibilities for the first position, 
and for each of those there are n- 1 remaining for the second, and so forth.) 

If this is stili about as ciear as mud, don’t worry. You’ll encounter powers and logarithms repeatedly 
throughout the book, in rather concrete settings, where their meanings should be understandable. 



Summary 

This chapter started with some important foundational concepts, defining (somewhat loosely) the 
notions of algorithms, (abstract) computers, and problems. This was followed by the two main topics, 
asymptotic notation and graphs. Asymptotic notation is used to describe the growth of a function; it lets 
us ignore irrelevant additive and multiplicative constants and focus on the dominating part. This allows 
us evaluate the salient features of the running time of an algorithm in the abstract, without worrying 
about the specifics of a given implementation. The three Greek letters O, <T>, and 0 give us upper, lower, 
and combined asymptotic limits, and each can be used on either of the best-case, worst-case, or 
average-case behavior of an algorithm. As a supplement to this theoretical analysis, I gave you some 
brief guidelines for testing your program. 



40 




CHAPTER 2 THE BASICS 



Graphs are abstract mathematical objects, used to represent ali kinds of network structures. They 
consist of a set of nodes, connected by edges, and the edges can have properties such as direction and 
weight. Graph theory has an extensive vocabulary, and a lot of it is summed up in Appendix C. The 
second part of the chapter dealt with representing these structures in actual Python programs, primarily 
using variations of adjacency lists and adjacency matrices, implemented with various combinations of 
list, dict, and set. 

Finally, there was a section about the dangers of black boxes. You should look around for potential 
traps — things you use without knowing how they work. For example, some rather straightforward uses of 
built-in Python functions can give you a quadratic running time rather than a linear one. Profiling your 
program can, perhaps, uncover such performance problems. There are traps related to accuracy as well. 
Carless use of floating-point numbers, for example, can give you inaccurate answers. If it’s critical to get 
an accurate answer, the best solution may be to calculate it with two separately implemented programs, 
comparing the results. 



If You’re Curious ... 

If you want to know more about Turing machines and the basies of computation, you might like The 
Annotated Turing, by Charles Petzold. It’s structured as an annotated version of Turing’s original paper, 
but most of the contents are Petzold’s explanations of the main concepts, with lots of examples. It’s a 
great intro to the topic. For an fundamental textbook on computation, you could take a look at Elements 
ofthe Theory of Computation by Lewis and Papadimitriou. For an easy-to-read, wide-ranging popular 
introduction to the basic concepts of algorithmies, I recommend Algorithmic Adventares: From 
Knowledge to Magic, by Juraj Hromkovi. For more specifics on asymptotic analysis, a solid textbook, 
such as one of those discussed in Chapter 1, would probably be a good idea. (The book by Cormen et al. 
is considered a good reference work for this sort of tliing.) You can certainly also find a lot of good 
information Online (such as in Wikipedia 19 ), but you should double-check the information before relying 
on it for anything important, of course. (If you want some historical background, you could read Donald 
Knuth’s paper “Big Omicron and big Omega and big Theta,” from 1976.) 

For some specifics on the perils and practices of algorithmic experiments, there are several good 
papers, such as “Towards a discipline of experimental algorithmies,” “On comparing classifiers,” "Don’t 
compare averages,” “How not to lie with statisties,” “Presenting data from experiments in algorithmies,” 
“Visual presentation of data by means of box plots,” and “Using finite experiments to study asymptotic 
performance” (details in the “References” section). For visualizing data, take a look at Beginning Python 
Visualization by Shai Vaingast. 

There are many textbooks on graph theory — some are rather technical and advanced (such as those 
by Bang-Jensen and Gutin, Bondy, and Murty, or Diestel, for example), and some are quite readable, 
even for the novice mathematician (such as the one by West). There are even specialized books on, say, 
types of graphs (Brandstadt et al., 1999) or graph representations (Spinrad, 2003). If this is a topic that 
interests you, you shouldn’t have any trouble finding lots of material, either in books or Online. For more 
on best practices when using floating-point numbers, take a look at Foreman S. Acton’s Real Computing 
Made Real: Preventing Errors in Scientific Engineering Calculations. 



19 http://wikipedia.org 
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Exercises 

2-1. When constructing a multidimensional array using Python lists, you need to use for loops (or 
something equivalent, such as list comprehension). Why would it be problematic to create a 10x10 array 
with the expression [[0]*10]*10? 

2-2. Assume (perhaps a bit unrealistically) that allocating a block of memory takes constant time, as long 
as you leave it uninitialized (that is, it contains whatever arbitrary “junk” was left there the last time it 
was used). You want an array of n integers, and you want to keep track of whether each entry is 
unitialized or whether it contains a number you put there. This is a check you want to be able to do in 
constant time for any entry. How would you do this with only constant time for initialization? (And how 
could you use this to initialize an empty adjacency array in constant time, thereby avoiding an otherwise 
obligatory quadratic minimum running time?) 

2-3. Show that O and Q are inverses of one another, that is, if/is 0(g), then gis Q(f), and vice versa. 

2-4. Logarithms can have different bases, but algorists don’t usually care. To see why, consider the 
equation log,, n = (log a n)t (log„ b ) . First, can you see why this is true? Second, why does this mean that we 
usually don't worry about bases? 

2-5. Show that any increasing exponential (0(fc") for k > 1) dominates any polynomial (0(n J ) for j> 0). 

2-6. Show that any polynomial (that is, Q{n k ), for any constant k > 0) asymptotically dominates any 
logarithm (that is, 0(lg n)). (Note that the polynomials here include, for example, the square root, for 
k= 0.5.) 

2-7. Research or conjecture the asymptotic complexity of various operations on Python lists, such as 
indexing, item assignment, reversing, appending, and inserting (the latter two discussed in the black box 
sidebar on list). How would these be different in a linked list implementation? What about, for 
example, list.extend? 

2-8. Show that the expressioris Q(f) + 0(g) =&(f+ g) and @{f) ■ 0(g) = Q{f- g) are correct. Also, try your 
hand atma x(0(/),0(g)) =0(max(/, g)) =&(f+g). 

2-9. In Appendix C, you’11 find a numbered list of statements about trees. Show that they are equivalent. 

2-10. Let Tbe an arbitrary rooted tree with at least three nodes, where each internal node has exactly two 
children. If Thas n leaves, how many internal nodes does it have? 

2-11. Show that a DAG can have any (underlying) structure whatsoever. Put differently, any (undirected) 
graph can be the underlying graph for a DAG, or, given a graph, you can always orient its edges so that 
the resulting digraph is a DAG. 

2-12. Consider the following graph representation: you use a dictionary and let each key be a pair (tuple) 
of two nodes, with the corresponding value set to the edge weight. For example W[u, v] = 42. What 
would be the advantages and disadvantages of this representation? Could you supplement it to mitigate 
the downsides? 
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Gcxrting]0L 



The greatest shortcoming of the humari race is our inability to understand the 
exponential function. 



— Dr. Albert A. Bartlett, World Population Balance Board of Advisors 

At one time, when the famous mathematician Cari Friedrich Gauss was in primary school, his teacher 
asked the pupils to add all the integers from 1 to 100 (or, at least, that’s the most common version of the 
story). No doubt, the teacher expected this to occupy his students for a while, but Gauss produced the 
resuit almost immediately. This might seem to require lightning-fast mental arithmetic, but the truth is, 
the actual calculation needed is quite simple; the trick is really understanding the problem. 

After the previous chapter, you may have become a bit jaded about such things. “Obviously, the 
answer is 0(1),” you say. Well, yes ... but let’s say we were to sum the integers from 1 to ra? The following 
sections deal with some important problems like this, which will crop up again and again in the analysis 
of algorithms. The chapter may be a bit challenging at times, but the ideas presented are crucial and well 
worth the effort. They’11 make the rest of the book that much easier to understand. First, IT1 give you a 
brief explanation of the concept of sums and some basic ways of manipulating them. Then come the two 
major sections of the chapter: one on two fundamental sums (or combinatorial problems, depending on 
your perspective) and the other on so-called recurrence relations, which you’11 need to analyze recursive 
algorithms later. Between these two is a little section on subsets, combinations, and permutations. 



Tip There’s quite a bit of math in this chapter. If that’s not your thing, you might want to skim it for now and 
come back to it as needed while reading the rest of the book. (Several of the ideas in this chapter will probably 
make the rest of the book easier to understand, though.) 



The Skinny on Sums 

In Chapter 2, 1 explained that when two loops are nested and the complexity of the inner one varies from 
iteration to iteration of the outer one, you need to start summing. In fact, sums crop up all over the place 
in algorithmics, so you might as well get used to thinking about them. Let’s start with the basic notation. 
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More Greek 

In Python, you might write the following: 
x*sum(S) == sum(x*y for y in S) 

With mathematical notation, you’d write this: 

*• Z y= Z x y 

yeS yeS 

(Can you see why this equation is true?) This capital sigma can seem a bit intimidating if you haven’t 
worked with it before. It is, however, no scarier than the sum function in Python; the syntax is just a bit 
different. The sigma itself indicates that we’re doing a sum, and we place information about what to sum 
above, below, and to the right of it. What we place to the right (in the previous example, y and xy) are the 
values to sum, while we put a description of which items to iterate over below the sigma. 

Instead of just iterating over objects in a set (or other collection), we can supply limits to the sum, 
like with range (except that both limits are inclusive). The general expression "sum f{i) for i = m to n” is 
written like this: 

n 

Z m 

i-m 

The Python equivalent would be as follows: 
sum(f(i) for i in range(m, n+l)) 

It might be even easier for many programmers to think of these sums as a mathematical way of 
writing loops: 

s = 0 

for i in range(m, n+l); 
s += f(i) 

The more compact mathematical notation has the advantage of giving us a better overview of what’s 
going on. 



Working with Sums 

The sample equation in the previous section, where the factor x was moved inside the sum, is just one of 
several useful “manipulation rules” you’re allowed to use when working with sums. Here’s a summary of 
two of the most important ones (for our purposes); 

n n 

c-Z /m = Z c '/w 

i-m i-m 

Multiplicative constants can be moved in or out of sums. That’s also what the initial example in the 
previous section illustrated. This is the same rule of distributivity that you’ve seen in simpler sums many 
times: c[f{m) + ... +/(«)) = c/(m) + ... + cf{n). 
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n n n 

£/(*')+ Eg(o= £(/(i)+g(0) 

i-m i-m i-m 

lnstead ofadding two sums, you can sum their added contents. This just means that if you’re going to 
sum up a bunch of stuff, it doesn’t matter how you do it; that is, 

sum(f(i) for i in seq) + sum(g(i) for i in seq) 

is exactly the same as sum(f (i) + g(i) for i in seq). This is just an instance of associativity. If you want to 
subtract two sums, you can use the same trick. (If you want, you can pretend you’re moving the constant 
factor -1 into the second sum.) 



A Tale of Two Tournaments 

There are plenty of sums that you might find useful in your work, and a good mathematics reference 
will probably give you the solution to most of them. There are, however, two sums, or combinatorial 
problems, that cover the majority of the cases you'll meet in this book — or, indeed, most basic 
algorithm work. 

I’ve been explaining these two ideas repeatedly over the years, using many different examples and 
metaphors, but I think one rather memorable (and I hope understandable) way of presenting them is as 
two forms of tournaments. 



Note There is, actually, a technical meaning of the word tournament in graph theory (a complete graph, where 
each edge is assigned a direction). That’s not what l’m talking about here (although the concepts are related). 



Although there are many types of tournaments, let’s consider two rather common ones, with rather 
catchy names. These are the round-robin tournament and the knockout tournament. 

In a round-robin tournament (or, specifically, a single round-robin tournament), each contestant 
meets each of the others in turn. The question then becomes, how many matches or fixtures do we need, 
if we have, for example, n knights jousting? (Substitute your favorite competitive activity here, if you 
want.) In a knockout tournament, the competitors are arranged in pairs, and only the winner from each 
pair goes on to the next round. Here there are more questions to ask: for n knights, how many rounds to 
we need, and how many matches will there be, in total? 



Shaking Hands 

The round-robin problem is exactly equivalent to another well-known puzzler: if you have n algorists 
meeting at a conference and they ali shake hands, how many handshakes do you get? Or, equivalently, 
how many edges are there in a complete graph with n nodes (see Figure 3-1)? It’s the same count you get 
in any kind of “all against ali” situations. For example, if you have n locations on a map and want to find 
the two that are closest to each other, the simple (brute-force) approach would be to compare all points 
with all others. To find the running time to this algorithm, you need to solve the round-robin problem. 

(A more efficient solution to this closest pair problem is presented in Chapter 6.) 
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